Search | VHL Regional Portal

1.

The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species.

Putman, Tim E; Schaper, Kevin; Matentzoglu, Nicolas; Rubinetti, Vincent P; Alquaddoomi, Faisal S; Cox, Corey; Caufield, J Harry; Elsarboukh, Glass; Gehrke, Sarah; Hegde, Harshad; Reese, Justin T; Braun, Ian; Bruskiewich, Richard M; Cappelletti, Luca; Carbon, Seth; Caron, Anita R; Chan, Lauren E; Chute, Christopher G; Cortes, Katherina G; De Souza, Vinícius; Fontana, Tommaso; Harris, Nomi L; Hartley, Emily L; Hurwitz, Eric; Jacobsen, Julius O B; Krishnamurthy, Madan; Laraway, Bryan J; McLaughlin, James A; McMurry, Julie A; Moxon, Sierra A T; Mullen, Kathleen R; O'Neil, Shawn T; Shefchek, Kent A; Stefancsik, Ray; Toro, Sabrina; Vasilevsky, Nicole A; Walls, Ramona L; Whetzel, Patricia L; Osumi-Sutherland, David; Smedley, Damian; Robinson, Peter N; Mungall, Christopher J; Haendel, Melissa A; Munoz-Torres, Monica C.

Nucleic Acids Res ; 52(D1): D938-D949, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-38000386

ABSTRACT

Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.

Subject(s)

Databases, Factual , Disease , Genes , Phenotype , Humans , Internet , Databases, Factual/standards , Software , Genes/genetics , Disease/genetics

2.

Predicting nutrition and environmental factors associated with female reproductive disorders using a knowledge graph and random forests.

Chan, Lauren E; Casiraghi, Elena; Putman, Tim; Reese, Justin; Harmon, Quaker E; Schaper, Kevin; Hedge, Harshad; Valentini, Giorgio; Schmitt, Charles; Motsinger-Reif, Alison; Hall, Janet E; Mungall, Christopher J; Robinson, Peter N; Haendel, Melissa A.

medRxiv ; 2023 Jul 16.

Article in English | MEDLINE | ID: mdl-37502882

ABSTRACT

Objective: Female reproductive disorders (FRDs) are common health conditions that may present with significant symptoms. Diet and environment are potential areas for FRD interventions. We utilized a knowledge graph (KG) method to predict factors associated with common FRDs (e.g., endometriosis, ovarian cyst, and uterine fibroids). Materials and Methods: We harmonized survey data from the Personalized Environment and Genes Study on internal and external environmental exposures and health conditions with biomedical ontology content. We merged the harmonized data and ontologies with supplemental nutrient and agricultural chemical data to create a KG. We analyzed the KG by embedding edges and applying a random forest for edge prediction to identify variables potentially associated with FRDs. We also conducted logistic regression analysis for comparison. Results: Across 9765 PEGS respondents, the KG analysis resulted in 8535 significant predicted links between FRDs and chemicals, phenotypes, and diseases. Amongst these links, 32 were exact matches when compared with the logistic regression results, including comorbidities, medications, foods, and occupational exposures. Discussion: Mechanistic underpinnings of predicted links documented in the literature may support some of our findings. Our KG methods are useful for predicting possible associations in large, survey-based datasets with added information on directionality and magnitude of effect from logistic regression. These results should not be construed as causal, but can support hypothesis generation. Conclusion: This investigation enabled the generation of hypotheses on a variety of potential links between FRDs and exposures. Future investigations should prospectively evaluate the variables hypothesized to impact FRDs.

3.

KG-Hub-building and exchanging biological knowledge graphs.

Caufield, J Harry; Putman, Tim; Schaper, Kevin; Unni, Deepak R; Hegde, Harshad; Callahan, Tiffany J; Cappelletti, Luca; Moxon, Sierra A T; Ravanmehr, Vida; Carbon, Seth; Chan, Lauren E; Cortes, Katherina; Shefchek, Kent A; Elsarboukh, Glass; Balhoff, Jim; Fontana, Tommaso; Matentzoglu, Nicolas; Bruskiewich, Richard M; Thessen, Anne E; Harris, Nomi L; Munoz-Torres, Monica C; Haendel, Melissa A; Robinson, Peter N; Joachimiak, Marcin P; Mungall, Christopher J; Reese, Justin T.

Bioinformatics ; 39(7)2023 07 01.

Article in English | MEDLINE | ID: mdl-37389415

ABSTRACT

MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org.

Subject(s)

Biological Ontologies , COVID-19 , Humans , Pattern Recognition, Automated , Rare Diseases , Machine Learning

4.

The Ontology of Biological Attributes (OBA)-computational traits for the life sciences.

Stefancsik, Ray; Balhoff, James P; Balk, Meghan A; Ball, Robyn L; Bello, Susan M; Caron, Anita R; Chesler, Elissa J; de Souza, Vinicius; Gehrke, Sarah; Haendel, Melissa; Harris, Laura W; Harris, Nomi L; Ibrahim, Arwa; Koehler, Sebastian; Matentzoglu, Nicolas; McMurry, Julie A; Mungall, Christopher J; Munoz-Torres, Monica C; Putman, Tim; Robinson, Peter; Smedley, Damian; Sollis, Elliot; Thessen, Anne E; Vasilevsky, Nicole; Walton, David O; Osumi-Sutherland, David.

Mamm Genome ; 34(3): 364-378, 2023 09.

Article in English | MEDLINE | ID: mdl-37076585

ABSTRACT

Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.

Subject(s)

Biological Ontologies , Biological Science Disciplines , Genome-Wide Association Study , Phenotype

5.

The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences.

Stefancsik, Ray; Balhoff, James P; Balk, Meghan A; Ball, Robyn; Bello, Susan M; Caron, Anita R; Chessler, Elissa; de Souza, Vinicius; Gehrke, Sarah; Haendel, Melissa; Harris, Laura W; Harris, Nomi L; Ibrahim, Arwa; Koehler, Sebastian; Matentzoglu, Nicolas; McMurry, Julie A; Mungall, Christopher J; Munoz-Torres, Monica C; Putman, Tim; Robinson, Peter; Smedley, Damian; Sollis, Elliot; Thessen, Anne E; Vasilevsky, Nicole; Walton, David O; Osumi-Sutherland, David.

bioRxiv ; 2023 Jan 27.

Article in English | MEDLINE | ID: mdl-36747660

ABSTRACT

Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.

6.

Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science.

Unni, Deepak R; Moxon, Sierra A T; Bada, Michael; Brush, Matthew; Bruskiewich, Richard; Caufield, J Harry; Clemons, Paul A; Dancik, Vlado; Dumontier, Michel; Fecho, Karamarie; Glusman, Gustavo; Hadlock, Jennifer J; Harris, Nomi L; Joshi, Arpita; Putman, Tim; Qin, Guangrong; Ramsey, Stephen A; Shefchek, Kent A; Solbrig, Harold; Soman, Karthik; Thessen, Anne E; Haendel, Melissa A; Bizon, Chris; Mungall, Christopher J.

Clin Transl Sci ; 15(8): 1848-1855, 2022 08.

Article in English | MEDLINE | ID: mdl-36125173

ABSTRACT

Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph-based data models elucidate the interconnectedness among core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these "knowledge graphs" (KGs) has remained difficult. Data set heterogeneity and complexity; the proliferation of ad hoc data formats; poor compliance with guidelines on findability, accessibility, interoperability, and reusability; and, in particular, the lack of a universally accepted, open-access model for standardization across biomedical KGs has left the task of reconciling data sources to downstream consumers. Biolink Model is an open-source data model that can be used to formalize the relationships between data structures in translational science. It incorporates object-oriented classification and graph-oriented features. The core of the model is a set of hierarchical, interconnected classes (or categories) and relationships between them (or predicates) representing biomedical entities such as gene, disease, chemical, anatomic structure, and phenotype. The model provides class and edge attributes and associations that guide how entities should relate to one another. Here, we highlight the need for a standardized data model for KGs, describe Biolink Model, and compare it with other models. We demonstrate the utility of Biolink Model in various initiatives, including the Biomedical Data Translator Consortium and the Monarch Initiative, and show how it has supported easier integration and interoperability of biomedical KGs, bringing together knowledge from multiple sources and helping to realize the goals of translational science.

Subject(s)

Pattern Recognition, Automated , Translational Science, Biomedical , Knowledge

7.

A Simple Standard for Sharing Ontological Mappings (SSSOM).

Matentzoglu, Nicolas; Balhoff, James P; Bello, Susan M; Bizon, Chris; Brush, Matthew; Callahan, Tiffany J; Chute, Christopher G; Duncan, William D; Evelo, Chris T; Gabriel, Davera; Graybeal, John; Gray, Alasdair; Gyori, Benjamin M; Haendel, Melissa; Harmse, Henriette; Harris, Nomi L; Harrow, Ian; Hegde, Harshad B; Hoyt, Amelia L; Hoyt, Charles T; Jiao, Dazhi; Jiménez-Ruiz, Ernesto; Jupp, Simon; Kim, Hyeongsik; Koehler, Sebastian; Liener, Thomas; Long, Qinqin; Malone, James; McLaughlin, James A; McMurry, Julie A; Moxon, Sierra; Munoz-Torres, Monica C; Osumi-Sutherland, David; Overton, James A; Peters, Bjoern; Putman, Tim; Queralt-Rosinach, Núria; Shefchek, Kent; Solbrig, Harold; Thessen, Anne; Tudorache, Tania; Vasilevsky, Nicole; Wagner, Alex H; Mungall, Christopher J.

Database (Oxford) ; 20222022 05 25.

Article in English | MEDLINE | ID: mdl-35616100

ABSTRACT

Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.

Subject(s)

Metadata , Semantic Web , Data Management , Databases, Factual , Workflow

8.

The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species.

Shefchek, Kent A; Harris, Nomi L; Gargano, Michael; Matentzoglu, Nicolas; Unni, Deepak; Brush, Matthew; Keith, Daniel; Conlin, Tom; Vasilevsky, Nicole; Zhang, Xingmin Aaron; Balhoff, James P; Babb, Larry; Bello, Susan M; Blau, Hannah; Bradford, Yvonne; Carbon, Seth; Carmody, Leigh; Chan, Lauren E; Cipriani, Valentina; Cuzick, Alayne; Della Rocca, Maria; Dunn, Nathan; Essaid, Shahim; Fey, Petra; Grove, Chris; Gourdine, Jean-Phillipe; Hamosh, Ada; Harris, Midori; Helbig, Ingo; Hoatlin, Maureen; Joachimiak, Marcin; Jupp, Simon; Lett, Kenneth B; Lewis, Suzanna E; McNamara, Craig; Pendlington, Zoë M; Pilgrim, Clare; Putman, Tim; Ravanmehr, Vida; Reese, Justin; Riggs, Erin; Robb, Sofia; Roncaglia, Paola; Seager, James; Segerdell, Erik; Similuk, Morgan; Storm, Andrea L; Thaxon, Courtney; Thessen, Anne; Jacobsen, Julius O B.

Nucleic Acids Res ; 48(D1): D704-D715, 2020 01 08.

Article in English | MEDLINE | ID: mdl-31701156

ABSTRACT

In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven't been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search. We develop many widely adopted ontologies that together enable sophisticated computational analysis, mechanistic discovery and diagnostics of Mendelian diseases. Our algorithms and tools are widely used to identify animal models of human disease through phenotypic similarity, for differential diagnostics and to facilitate translational research. Launched in 2015, Monarch has grown with regards to data (new organisms, more sources, better modeling); new API and standards; ontologies (new Mondo unified disease ontology, improvements to ontologies such as HPO and uPheno); user interface (a redesigned website); and community development. Monarch data, algorithms and tools are being used and extended by resources such as GA4GH and NCATS Translator, among others, to aid mechanistic discovery and diagnostics.

Subject(s)

Computational Biology/methods , Genotype , Phenotype , Algorithms , Animals , Biological Ontologies , Databases, Genetic , Exome , Genetic Association Studies , Genetic Variation , Genomics , Humans , Internet , Software , Translational Research, Biomedical , User-Computer Interface

9.

ChlamBase: a curated model organism database for the Chlamydia research community.

Putman, Tim; Hybiske, Kevin; Jow, Derek; Afrasiabi, Cyrus; Lelong, Sebastien; Cano, Marco Alvarado; Stupp, Gregory S; Waagmeester, Andra; Good, Benjamin M; Wu, Chunlei; Su, Andrew I.

Database (Oxford) ; 20192019 01 01.

Article in English | MEDLINE | ID: mdl-31211397

10.

ChlamBase: a curated model organism database for the Chlamydia research community.

Putman, Tim; Hybiske, Kevin; Jow, Derek; Afrasiabi, Cyrus; Lelong, Sebastien; Cano, Marco Alvarado; Wu, Chunlei; Su, Andrew I.

Database (Oxford) ; 20192019 01 01.

Article in English | MEDLINE | ID: mdl-30985891

ABSTRACT

The accelerating growth of genomic and proteomic information for Chlamydia species, coupled with unique biological aspects of these pathogens, necessitates bioinformatic tools and features that are not provided by major public databases. To meet these growing needs, we developed ChlamBase, a model organism database for Chlamydia that is built upon the WikiGenomes application framework, and Wikidata, a community-curated database. ChlamBase was designed to serve as a central access point for genomic and proteomic information for the Chlamydia research community. ChlamBase integrates information from numerous external databases, as well as important data extracted from the literature that are otherwise not available in structured formats that are easy to use. In addition, a key feature of ChlamBase is that it empowers users in the field to contribute new annotations and data as the field advances with continued discoveries. ChlamBase is freely and publicly available at chlambase.org.

Subject(s)

Chlamydia , Data Curation , Databases, Genetic , Chlamydia/classification , Chlamydia/genetics , Chlamydia/metabolism , Genomics , Proteomics

11.

WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata.

Putman, Tim E; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian; Waagmeester, Andra; Diesh, Colin; Dunn, Nathan; Munoz-Torres, Monica; Stupp, Gregory S; Wu, Chunlei; Su, Andrew I; Good, Benjamin M.

Database (Oxford) ; 2017(1)2017 01 01.

Article in English | MEDLINE | ID: mdl-28365742

ABSTRACT

With the advancement of genome-sequencing technologies, new genomes are being sequenced daily. Although these sequences are deposited in publicly available data warehouses, their functional and genomic annotations (beyond genes which are predicted automatically) mostly reside in the text of primary publications. Professional curators are hard at work extracting those annotations from the literature for the most studied organisms and depositing them in structured databases. However, the resources don't exist to fund the comprehensive curation of the thousands of newly sequenced organisms in this manner. Here, we describe WikiGenomes (wikigenomes.org), a web application that facilitates the consumption and curation of genomic data by the entire scientific community. WikiGenomes is based on Wikidata, an openly editable knowledge graph with the goal of aggregating published knowledge into a free and open database. WikiGenomes empowers the individual genomic researcher to contribute their expertise to the curation effort and integrates the knowledge into Wikidata, enabling it to be accessed by anyone without restriction. Database URL: www.wikigenomes.org.

Subject(s)

Databases, Nucleic Acid , Genome , Internet , Molecular Sequence Annotation/methods , Molecular Sequence Annotation/standards

12.

Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes.

Putman, Tim E; Burgstaller-Muehlbacher, Sebastian; Waagmeester, Andra; Wu, Chunlei; Su, Andrew I; Good, Benjamin M.

Database (Oxford) ; 20162016.

Article in English | MEDLINE | ID: mdl-27022157

ABSTRACT

The last 20 years of advancement in sequencing technologies have led to sequencing thousands of microbial genomes, creating mountains of genetic data. While efficiency in generating the data improves almost daily, applying meaningful relationships between taxonomic and genetic entities on this scale requires a structured and integrative approach. Currently, knowledge is distributed across a fragmented landscape of resources from government-funded institutions such as National Center for Biotechnology Information (NCBI) and UniProt to topic-focused databases like the ODB3 database of prokaryotic operons, to the supplemental table of a primary publication. A major drawback to large scale, expert-curated databases is the expense of maintaining and extending them over time. No entity apart from a major institution with stable long-term funding can consider this, and their scope is limited considering the magnitude of microbial data being generated daily. Wikidata is an openly editable, semantic web compatible framework for knowledge representation. It is a project of the Wikimedia Foundation and offers knowledge integration capabilities ideally suited to the challenge of representing the exploding body of information about microbial genomics. We are developing a microbial specific data model, based on Wikidata's semantic web compatibility, which represents bacterial species, strains and the gene and gene products that define them. Currently, we have loaded 43,694 gene and 37,966 protein items for 21 species of bacteria, including the human pathogenic bacteriaChlamydia trachomatis.Using this pathogen as an example, we explore complex interactions between the pathogen, its host, associated genes, other microbes, disease and drugs using the Wikidata SPARQL endpoint. In our next phase of development, we will add another 99 bacterial genomes and their gene and gene products, totaling â¼900,000 additional entities. This aggregation of knowledge will be a platform for community-driven collaboration, allowing the networking of microbial genetic data through the sharing of knowledge by both the data and domain expert.

Subject(s)

Data Curation , Genome, Microbial , Models, Theoretical , Female , Gene Ontology , Genes, Bacterial , Humans , Molecular Sequence Annotation , Operon/genetics , Search Engine

13.

Wikidata as a semantic framework for the Gene Wiki initiative.

Burgstaller-Muehlbacher, Sebastian; Waagmeester, Andra; Mitraka, Elvira; Turner, Julia; Putman, Tim; Leong, Justin; Naik, Chinmay; Pavlidis, Paul; Schriml, Lynn; Good, Benjamin M; Su, Andrew I.

Database (Oxford) ; 20162016.

Article in English | MEDLINE | ID: mdl-26989148

ABSTRACT

Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes have been imported from NCBI and 27,306 human proteins and 16,728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists. In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web. Database URL: https://www.wikidata.org/.

Subject(s)

Databases, Nucleic Acid , Semantics , Animals , Humans , Mice , Models, Theoretical , Search Engine

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL